Word Sense Induction Using Graphs of Collocations

نویسندگان

  • Ioannis P. Klapaftis
  • Suresh Manandhar
چکیده

Word Sense Induction (WSI) is the task of identifying the different senses (uses) of a target word in a given text. Traditional graph-based approaches create and then cluster a graph, in which each vertex corresponds to a word that co-occurs with the target word, and edges between vertices are weighted based on the co-occurrence frequency of their associated words. In contrast, in our approach each vertex corresponds to a collocation that co-occurs with the target word, and edges between vertices are weighted based on the co-occurrence frequency of their associated collocations. A smoothing technique is applied to identify more edges between vertices and the resulting graph is then clustered. Our evaluation under the framework of SemEval-2007 WSI task shows the following: (a) our approach produces less sense-conflating clusters than those produced by traditional graph-based approaches, (b) our approach outperforms the existing state-of-the-art results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Class-based collocations for Word Sense Disambiguation

This paper describes the NMSU-Pitt-UNCA word-sense disambiguation system participating in the Senseval-3 English lexical sample task. The focus of the work is on using semantic class-based collocations to augment traditional word-based collocations. Three separate sources of word relatedness are used for these collocations: 1) WordNet hypernym relations; 2) cluster-based word similarity classes...

متن کامل

NEUNLPLab Chinese Word Sense Induction System for SIGHAN Bakeoff 2010

This paper describes a character-based Chinese word sense induction (WSI) system for the International Chinese Language Processing Bakeoff 2010. By computing the longest common substrings between any two contexts of the ambiguous word, our system extracts collocations as features and does not depend on any extra tools, such as Chinese word segmenters. We also design a constrained clustering alg...

متن کامل

Using Multiple Knowledge Sources for Word Sense Discrimination

This paper addresses the problem of how to identify the intended meaning of individual words in unrestricted texts, without necessarily having access to complete representations of sentences. To discriminate senses, an understander can consider a diversity of information, including syntactic tags, word frequencies, collocations, semantic context, role-related expectations, and syntactic restric...

متن کامل

Discrimination of Word Senses with Hypernyms

Languages are inherently ambiguous. Four out of five words in English have more than one meaning. Nowadays there is a growing number of small proprietary thesauri used for knowledge management for different applications. In order to enable the usage of these thesauri for automatic text annotations, we introduce a robust method for discriminating word senses using hypernyms. The method uses coll...

متن کامل

Preposition Semantic Classification via Treebank and FrameNet

This paper reports on experiments in classifying the semantic role annotations assigned to prepositional phrases in both PENN TREEBANK (version II) and FRAMENET (version 0.75). In both cases, experiments are done to see how the prepositions can be classified given the dataset’s role inventory, using standard word-sense disambiguation features, such as the parts of speech of surrounding words, a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008